1. INTRODUCTION

1.1 Overview

In the last lesson we explored the fundamentals of spatial data, coordinate reference systems, and how the sf package enabled us to do some basic spatial analysis. Here, we’ll dive deeper into spatial analyses with the sf package and its ability to read and write spatial data formats.

The context of our analysis here is to explore the spatial relationships between our EPA air monitoring sites and some demographic data taken from the US Census.

1.2 Learning outcomes

At the end of this lesson, you should be able to: - Read shapefiles and GeoJSON data into R as spatial features objects - Explore and wrangle spatial features using “tidyverse” functions - Aggregate spatial features using group_by() and summarize()

Specifically, we’ll [re]examine the following: - Reading shapefiles into R with sf - Spatial aggregation with group_by and summarize or st_union - Visualizing multiple datasets - Changing CRS with transform - Attribute joins with merge - Spatial joins - Geometry manipulations - Buffer - Convex hull - Voronoi polygon - Select polygon by location (buffer and intersect)

1.3 Set up

The typical set up procedure: confirm the working directory and import libraries.

#Examine the working directory
getwd()
## [1] "C:/Workspace/Gits/SITES/EDA-Fall2022-base"
#Import libraries
library(tidyverse)
## -- Attaching packages --------------------------------------- tidyverse 1.3.2 --
## v ggplot2 3.3.6     v purrr   0.3.4
## v tibble  3.1.8     v dplyr   1.0.9
## v tidyr   1.2.0     v stringr 1.4.1
## v readr   2.1.2     v forcats 0.5.2
## -- Conflicts ------------------------------------------ tidyverse_conflicts() --
## x dplyr::filter() masks stats::filter()
## x dplyr::lag()    masks stats::lag()
library(lubridate)
## 
## Attaching package: 'lubridate'
## 
## The following objects are masked from 'package:base':
## 
##     date, intersect, setdiff, union
library(sf)
## Linking to GEOS 3.9.1, GDAL 3.4.3, PROJ 7.2.1; sf_use_s2() is TRUE
library(mapview)
library(RColorBrewer)
library(leaflet)

#Disable on-the-fly projections
sf::sf_use_s2(FALSE)
## Spherical geometry (s2) switched off

2 READING IN SPATIAL DATA

2.1 Read tabular data and convert to spatial features

First, we’ll read in our EPA air quality sites, as we did in the previous exercise.

#Read our EPA points into a spatial dataframe
epa_pm25_sites_sf <- read_csv('./Data/Raw/EPAair_PM25_NC2018_raw.csv') %>% 
  group_by(`Site Name`, COUNTY, SITE_LATITUDE, SITE_LONGITUDE) %>% 
  summarize(
    meanPM = mean(`Daily Mean PM2.5 Concentration`,na.rm=T),
    maxPM = max(`Daily Mean PM2.5 Concentration`,na.rm=T)
    ) %>% 
  st_as_sf(coords = c('SITE_LONGITUDE','SITE_LATITUDE'), crs=4269)
## Rows: 8983 Columns: 20
## -- Column specification --------------------------------------------------------
## Delimiter: ","
## chr  (9): Date, Source, UNITS, Site Name, AQS_PARAMETER_DESC, CBSA_NAME, STA...
## dbl (11): Site ID, POC, Daily Mean PM2.5 Concentration, DAILY_AQI_VALUE, DAI...
## 
## i Use `spec()` to retrieve the full column specification for this data.
## i Specify the column types or set `show_col_types = FALSE` to quiet this message.
## `summarise()` has grouped output by 'Site Name', 'COUNTY', 'SITE_LATITUDE'. You can override using the `.groups` argument.
#Inspect the object
class(epa_pm25_sites_sf)
## [1] "sf"         "grouped_df" "tbl_df"     "tbl"        "data.frame"
#What is its CRS again?
st_crs(epa_pm25_sites_sf)$epsg
## [1] 4269
#Plot the data
ggplot(data=epa_pm25_sites_sf) +
  geom_sf()

2.2 Reading shapefiles into R with sf

The sf package allows us to read many existing data formats, including ArcGIS shapefiles. I’ve added a few shapefiles to our Data folder: one of all US counties and another of 8-digit hydrologic Unit codes (HUCs) for NC. Here we explore how they are read into R as spatial features.

2.2.1 Read in and explore NC counties

Below we read in the USA counties shapefile, filtering for just the NC features (NC has a state FIPS code of “37”…). We also see that sf plays nice with “tidyverse” syntax (e.g. pipes) and functions (e.g. filter). The sf package also includes some new spatial methods for exploring our data (e.g. st_bbox which draws a bounding box around all spatial features).

#
counties_sf<- st_read('./Data/Spatial/cb_2018_us_county_20m.shp') %>% 
  filter(STATEFP == 37) #Filter for just NC Counties
## Reading layer `cb_2018_us_county_20m' from data source 
##   `C:\Workspace\Gits\SITES\EDA-Fall2022-base\Data\Spatial\cb_2018_us_county_20m.shp' 
##   using driver `ESRI Shapefile'
## Simple feature collection with 3220 features and 9 fields
## Geometry type: MULTIPOLYGON
## Dimension:     XY
## Bounding box:  xmin: -179.1743 ymin: 17.91377 xmax: 179.7739 ymax: 71.35256
## Geodetic CRS:  NAD83
#
colnames(counties_sf)
##  [1] "STATEFP"  "COUNTYFP" "COUNTYNS" "AFFGEOID" "GEOID"    "NAME"    
##  [7] "LSAD"     "ALAND"    "AWATER"   "geometry"
#
st_crs(counties_sf) 
## Coordinate Reference System:
##   User input: NAD83 
##   wkt:
## GEOGCRS["NAD83",
##     DATUM["North American Datum 1983",
##         ELLIPSOID["GRS 1980",6378137,298.257222101,
##             LENGTHUNIT["metre",1]]],
##     PRIMEM["Greenwich",0,
##         ANGLEUNIT["degree",0.0174532925199433]],
##     CS[ellipsoidal,2],
##         AXIS["latitude",north,
##             ORDER[1],
##             ANGLEUNIT["degree",0.0174532925199433]],
##         AXIS["longitude",east,
##             ORDER[2],
##             ANGLEUNIT["degree",0.0174532925199433]],
##     ID["EPSG",4269]]
#
nrow(counties_sf)
## [1] 100
#Reveal the extent of this dataset via the st_bbox() function
st_bbox(counties_sf)
##      xmin      ymin      xmax      ymax 
## -84.32187  33.85111 -75.45866  36.58812
#View the data
head(counties_sf)
#Plot the data, colored by area of land in each county
mapView(counties_sf, zcol = "AWATER")

2.2.2 Read in and explore 8-digit HUC watersheds for NC

Now you try: Read in the NC 8-Digit HUC dataset: ./Data/Spatial/NCHUC8.shp into a variable named huc8_sf. What CRS does this dataset use? Is it the same as the counties dataset? What columns are included in this dataset? What do these features look like on a map?

#Read the shapefile into an sf dataframe named "huc8_sf"
huc8_sf <- st_read('./Data/Spatial/NCHUC8.shp')
## Reading layer `NCHUC8' from data source 
##   `C:\Workspace\Gits\SITES\EDA-Fall2022-base\Data\Spatial\NCHUC8.shp' 
##   using driver `ESRI Shapefile'
## Simple feature collection with 53 features and 7 fields
## Geometry type: MULTIPOLYGON
## Dimension:     XY
## Bounding box:  xmin: -84.32162 ymin: 33.83437 xmax: -75.45998 ymax: 36.58841
## Geodetic CRS:  WGS 84
#Reveal the columns
names(huc8_sf)
## [1] "FID"        "HUC_8"      "SUBBASIN"   "ACRES"      "SQ_MILES"  
## [6] "HU_8_STATE" "DWQ_Basin"  "geometry"
#Check the CRS
st_crs(huc8_sf)
## Coordinate Reference System:
##   User input: WGS 84 
##   wkt:
## GEOGCRS["WGS 84",
##     DATUM["World Geodetic System 1984",
##         ELLIPSOID["WGS 84",6378137,298.257223563,
##             LENGTHUNIT["metre",1]]],
##     PRIMEM["Greenwich",0,
##         ANGLEUNIT["degree",0.0174532925199433]],
##     CS[ellipsoidal,2],
##         AXIS["latitude",north,
##             ORDER[1],
##             ANGLEUNIT["degree",0.0174532925199433]],
##         AXIS["longitude",east,
##             ORDER[2],
##             ANGLEUNIT["degree",0.0174532925199433]],
##     ID["EPSG",4326]]
#Examine some records
head(huc8_sf)
#View the data as a map, colored by the acreage in each
ggplot(data=huc8_sf) +
  geom_sf(aes(fill=ACRES)) + 
  scale_fill_continuous(type='viridis')

2.2.3 Challenge!

Challenge: Read in the NC 8-Digit HUC dataset again, but this time filter the data so the result only includes the one with a SUBBASIN value of ‘Upper Neuse’. Then map this. Double bonus if you can map this HUC8 on top of the other HUC8s, showing the Upper Neuse as purple and the others as orange.

#Read the shapefile into an sf dataframe
huc8_sf.subset <- st_read('./Data/Spatial/NCHUC8.shp') %>% 
  filter(SUBBASIN == 'Upper Neuse')
## Reading layer `NCHUC8' from data source 
##   `C:\Workspace\Gits\SITES\EDA-Fall2022-base\Data\Spatial\NCHUC8.shp' 
##   using driver `ESRI Shapefile'
## Simple feature collection with 53 features and 7 fields
## Geometry type: MULTIPOLYGON
## Dimension:     XY
## Bounding box:  xmin: -84.32162 ymin: 33.83437 xmax: -75.45998 ymax: 36.58841
## Geodetic CRS:  WGS 84
#Create a map
ggplot() + 
  geom_sf(data=huc8_sf,fill='orange') + 
  geom_sf(data=huc8_sf.subset, fill='purple')

2.2.4 Reading in GeoJSON data

More and more spatial data are appearing on-line, in a format that allows us to connect directly to the data vs downloading local copies to our workspace. An example is the Homeland Infrstructure Foundation-Level Data (HIFLD) Their open data site (https://hifld-geoplatform.opendata.arcgis.com/) has links to many datasets.

When the data is served in GeoJSON format, we can ingest it directly in to R. Follow these steps: - Navigate to https://hifld-geoplatform.opendata.arcgis.com/ - Scroll down to the Explore Categories area. Select Energy (for example) - Search for power plants. - Select the first return and open its link - Locate the APIs dropdown and copy the link to the GeoJSON option.

If you have difficulty, the link you want is: - https://opendata.arcgis.com/datasets/ee0263bd105d41599be22d46107341c3_0.geojson

This a link to a spatial dataset in GeoJSON format. Now let’s read this dataset in and explore it:

powerplants_sf <- st_read('https://opendata.arcgis.com/datasets/ee0263bd105d41599be22d46107341c3_0.geojson')
## Reading layer `Power_Plants' from data source 
##   `https://opendata.arcgis.com/datasets/ee0263bd105d41599be22d46107341c3_0.geojson' 
##   using driver `GeoJSON'
## Simple feature collection with 11810 features and 44 fields
## Geometry type: POINT
## Dimension:     XY
## Bounding box:  xmin: -171.7124 ymin: -14.32451 xmax: 145.741 ymax: 71.29209
## Geodetic CRS:  WGS 84
#Reveal the field names
colnames(powerplants_sf)
##  [1] "OBJECTID"   "PLANT_CODE" "NAME"       "ADDRESS"    "CITY"      
##  [6] "STATE"      "ZIP"        "TELEPHONE"  "TYPE"       "STATUS"    
## [11] "COUNTY"     "COUNTYFIPS" "COUNTRY"    "LATITUDE"   "LONGITUDE" 
## [16] "NAICS_CODE" "NAICS_DESC" "SOURCE"     "SOURCEDATE" "VAL_METHOD"
## [21] "VAL_DATE"   "WEBSITE"    "OPERATOR"   "OPERAT_ID"  "OPER_CAP"  
## [26] "SUMMER_CAP" "WINTER_CAP" "PLAN_CAP"   "RETIRE_CAP" "GEN_UNITS" 
## [31] "PLAN_UNIT"  "RETIR_UNIT" "PRIM_FUEL"  "SEC_FUEL"   "COAL_USED" 
## [36] "NGAS_USED"  "OIL_USED"   "NET_GEN"    "CAP_FACTOR" "SUB_1"     
## [41] "SUB_2"      "LINES"      "SOURCE_LAT" "SOURC_LONG" "geometry"
#How many records
nrow(powerplants_sf)
## [1] 11810
#View on a map - specify the geometry column to draw faster
plot(powerplants_sf$geometry)

#Filter for just powerplants found in NC
nc_powerplants_sf <- powerplants_sf %>% 
  filter(STATE == "NC") 

#Have a look the variety of types (and number of each)
nc_powerplants_sf %>% 
  st_drop_geometry() %>% 
  count(TYPE, sort=TRUE)
#Most are solar farms; let's remove those
nc_powerplants_sf <- nc_powerplants_sf %>% 
  filter(TYPE != "SOLAR PHOTOVOLTAIC")

# Examine counts by fuel type
ggplot(nc_powerplants_sf) + 
  geom_sf(aes(color = PRIM_FUEL)) 

3. WORKING WITH SPATIAL DATA

3.1 Joining attributes to spatial features

Joining data to spatial features works the same as joining tables: we just need a common attribute to link the two datasets. Here, we’ll add demographic data to our Census county feature, using the State&County FIPS code as the common attributes.

The data we’ll add here is the CDC’s Social Vulnerability Index data. Information on this dataset is available here:

Social Vulnerability Data: - https://www.atsdr.cdc.gov/placeandhealth/svi/documentation/SVI_documentation_2018.html - https://www.atsdr.cdc.gov/placeandhealth/svi/documentation/pdf/SVI2018Documentation-H.pdf

Brief notes: - “E_” are estimates; “M_” are margins of error - “EP_ are estimates, in percentages -”SPL_THEME1” is sum of series - “RPL_THEME1” is percentile ranking

We’ll focus on just a few variables: Estimates of people in poverty (“E_POV”) and of minority population (“E_MINRTY”), keeping the location attributes as well.

#Read the 2018 SVI county-level dataset for NC
svi2018_nc_raw <- read.csv(
  'https://svi.cdc.gov/Documents/Data/2018_SVI_Data/CSV/States_Counties/NorthCarolina_COUNTY.csv',
  colClasses = c('FIPS' = 'factor')) %>% 
  select(COUNTY, FIPS, LOCATION, E_TOTPOP, E_POV, E_MINRTY)

#Check structure
str(svi2018_nc_raw)
## 'data.frame':    100 obs. of  6 variables:
##  $ COUNTY  : chr  "Currituck" "Dare" "Haywood" "Clay" ...
##  $ FIPS    : Factor w/ 100 levels "37001","37003",..: 27 28 44 22 90 88 15 55 85 57 ...
##  $ LOCATION: chr  "Currituck County, North Carolina" "Dare County, North Carolina" "Haywood County, North Carolina" "Clay County, North Carolina" ...
##  $ E_TOTPOP: int  25796 35741 60433 10813 226694 33513 10447 81441 45905 34410 ...
##  $ E_POV   : int  2562 2929 8304 1700 19921 4776 1172 11593 6166 5679 ...
##  $ E_MINRTY: int  3304 4447 4300 680 62132 3249 2047 12168 4022 3684 ...
#Join the SVI attributes to the county spatial features
counties_sf_join <-  merge(x = counties_sf,
                           y = svi2018_nc_raw, 
                           by.x = "GEOID", 
                           by.y = "FIPS" )

#Tidyverse version of the join
counties_sf_join <- counties_sf %>% 
  left_join(svi2018_nc_raw, by = c("GEOID" = "FIPS") )

#View with mapview
mapview(counties_sf_join, 
        zcol = 'E_POV', 
        col.regions = brewer.pal(2, 'RdBu')) + 
  mapview(epa_pm25_sites_sf, cex = 'maxPM')
## Warning in brewer.pal(2, "RdBu"): minimal value for n is 3, returning requested palette with 3 different levels
## Warning: Found less unique colors (3) than unique zcol values (99)! 
## Interpolating color vector to match number of zcol values.
#view with ggplot
ggplot() + 
  geom_sf(data=counties_sf_join,aes(fill = E_POV),alpha=0.3) + 
  scale_fill_gradient2(low="red",high="blue",midpoint = 60000) + 
  geom_sf(data=epa_pm25_sites_sf)

Now you try: The URL ‘https://raw.githubusercontent.com/ENV859/EnviroAtlasData/main/Wind_Energy.csv’ links to EPA’s EnviroAtlas data on the amount of wind energy estimated at the HUC12 scale. You need to load this data, group by HUC8 (computing the sum wind energy of each HUC12 in a given HUC8) and join with the HUC 8 spatial features dataset. * Be sure, as above, you read in the HUC_12 column as a factor so it doesn’t default to a numeric column.

#Compute HUC8 wind energy
wind_raw <- read.csv('https://raw.githubusercontent.com/ENV859/EnviroAtlasData/main/Wind_Energy.csv',
  colClasses = c('HUC_12' = 'factor')) %>%  
  mutate(HUC_8 = substr(HUC_12,1,8)) %>% 
  group_by(HUC_8) %>% 
  summarize(SumWindEnergy = sum(AvgWindEnergy))
  
#Join to HUC_8 features
huc8_sf_join = merge(x = huc8_sf,
                     y = wind_raw,
                     by.x = 'HUC_8',
                     by.y = 'HUC_8')

#View the outputs
ggplot(data=huc8_sf_join) +
  geom_sf(aes(fill=SumWindEnergy)) +
  labs(
    title='Wind energy by HUC 8',
    fill ='MW/year'
  )

3.1 Spatial data aggregation with group_by and summarize.

Here, we’ll explore another way in which sf works well with tidyverse functions. Specifically, we’ll see how the group_by and summarize functions work on spatial features much like they do with tabular records. In GIS, this is termed “dissolving” features because we are dissolving away boundaries shared by features with a common attribute value.

In our case, all of our county features share the same “STATEFP” value, so we’ll effectively dissolve away all county boundaries, leaving us with one feature: the outline of North Carolina.

#Aggregate the data using group_by and summarize, just as you would a non-spatial dataframe
state_sf <- counties_sf %>% 
  group_by('STATEFP') %>% 
  summarize(ALAND = sum(ALAND))
## although coordinates are longitude/latitude, st_union assumes that they are planar
#View the data
mapview(state_sf)

Now you try: Aggregate the HUC_8 data on the DWQ_Basin attribute, computing the sum of the ACRES and SQ_MILES field and view the result.

#List the unique values in the DWQ_Basin field
unique(huc8_sf$DWQ_Basin)
##  [1] "Roanoke"          "Chowan"           "Pasquotank"       "Tar-Pamlico"     
##  [5] "Neuse"            "White Oak"        "Cape Fear"        "Yadkin Pee Dee"  
##  [9] "Lumber"           "Catawba"          "Broad"            "Savannah"        
## [13] "New"              "Watauga"          "French Broad"     "Little Tennessee"
## [17] "Hiwassee"
#Summarize on DWQ Basin value
huc2_sf <- huc8_sf %>% 
  group_by(DWQ_Basin) %>% 
  summarize(
    acres = sum(ACRES),
    sq_miles = sum(SQ_MILES)
  )
## although coordinates are longitude/latitude, st_union assumes that they are planar
## although coordinates are longitude/latitude, st_union assumes that they are planar
## although coordinates are longitude/latitude, st_union assumes that they are planar
## although coordinates are longitude/latitude, st_union assumes that they are planar
## although coordinates are longitude/latitude, st_union assumes that they are planar
## although coordinates are longitude/latitude, st_union assumes that they are planar
## although coordinates are longitude/latitude, st_union assumes that they are planar
## although coordinates are longitude/latitude, st_union assumes that they are planar
## although coordinates are longitude/latitude, st_union assumes that they are planar
## although coordinates are longitude/latitude, st_union assumes that they are planar
## although coordinates are longitude/latitude, st_union assumes that they are planar
## although coordinates are longitude/latitude, st_union assumes that they are planar
## although coordinates are longitude/latitude, st_union assumes that they are planar
## although coordinates are longitude/latitude, st_union assumes that they are planar
## although coordinates are longitude/latitude, st_union assumes that they are planar
## although coordinates are longitude/latitude, st_union assumes that they are planar
## although coordinates are longitude/latitude, st_union assumes that they are planar
#Map the data
ggplot() + geom_sf(data=huc2_sf,aes(fill=acres))

3.2 Transforming coordinate reference systems of datasets

3.2.1 Review: Geographic vs projected data

When grouping data above, you may have noticed warnings that “although coordinates are longitude/latitude, st_union assumes that they are planar”. This gets us back to our conversation on projections.

Recall that we have two basic categories of coordinate systems: geographic, where coordinates are measured in angles (e.g. degrees of latitude and longitude); and projected, where coordinates are measured in linear units (e.g. meters or feet). The former maps our features on a sphere, and the latter maps our features on a plane. And the process of migrating spherical coordinates to planar ones is call “projecting” - and it involves a lot of math as well as various assumptions about the shape of the Earth, which is not a perfect sphere.

3.2.2 Reasons to project data before doing spatial analysis

Projecting data has trade-offs. On the negative side, projecting data distorts data (for the same reason that you can flatten an orange peel without stretching or tearing it). But on the positive side, planar data is much easier to work with, both mathematically and visually (on our flat screens as well on flat printed maps).

What I’m getting at is: it’s often a good idea to project data before doing spatial analysis. Packages like sf are suited to planar, not spherical data. And measuring distances and areas is much more sensible in linear units, not angular ones.

Note: GIS software is getting better at working with angular values and also with visualizing 3D (i.e. spherical) data. It’s not quite ubiquitous yet, but keep your eye on this.

3.2.3 Projecting your data

On top of projecting any geographic data to adjust for the limitations of the software, you’ll also be wise to ensure all use the same CRS. This can be done with the st_transform command, supplying the EPSG code of the CRS that you want your data to be in. Let’s get our main five sf objects all into a consistent CRS.

#Convert all to UTM Zone 17 (crs = 26917)
epa_sf_utm <-      st_transform(epa_pm25_sites_sf, crs = 26917)
counties_sf_utm <- st_transform(counties_sf, crs = 26917)
state_sf_utm <-    st_transform(state_sf,crs = 26917)
huc8_sf_utm <-     st_transform(huc8_sf, crs = 26917)
huc2_utm <-        st_transform(huc2_sf, crs = 26917)

3.3 Clipping and intersecting data

Now that our data are all in a common planar coordinate systems, let’s examine what we can do with them. The sf cheat sheet provides a nice summary of the types of operations. You’ll see we already used a few of the functions listed here. Now, let’s explore some more, starting with clipping one dataset with another.

In the exercise below, we’ll subset one of the HUCs (Upper Neuese), and explore counties that spatially overlap with this selected HUC. You’ll see there are two ways to subset data spatially.

#Clip the HUC2 data set by the NC State boundary dataset
neuse_sf <- huc2_utm %>% 
  filter(DWQ_Basin == "Neuse")

#Start building a map object
myMap = mapview(neuse_sf, 
                col.regions = 'yellow', 
                alpha.regions = 0.2,
                map.types = "CartoDB.Positron",
                legend = FALSE)
#Show the map
myMap
#Select intersecting counties using matrix subsetting
neuse_intersect_1 <- counties_sf_utm[neuse_sf,]

#Select intersecting counties using the `filter()` command
neuse_intersect_2 <- counties_sf_utm %>% 
  filter(st_intersects(x = ., y = neuse_sf, sparse = FALSE))

#View the result
myMap + mapview(neuse_intersect_2, alpha.regions = 0)
#Actually intersect the counties (not just select those that intersect)
neuse_counties_sf <- neuse_sf %>% 
  st_intersection(counties_sf_utm)
## Warning: attribute variables are assumed to be spatially constant throughout all
## geometries
myMap + mapview(neuse_counties_sf, alpha.regions = 0)
#Update the area the features, now that some are clipped
neuse_counties_sf <- neuse_counties_sf %>% 
  mutate(Area_m2 = as.numeric(st_area(neuse_counties_sf$geometry)))
  
mapview(neuse_counties_sf, zcol='Area_m2')

Now you try: Select the counties in the “Triangle” (Chatham, Durham, Orange, and Wake). Then select the HUC_8s that touch these counties. And finally, select the portions of the HUC_8s that occur within these counties.

#Select the Triangle County from the 
triangle_co <- counties_sf_utm %>% 
  filter(NAME %in% c('Chatham','Durham','Orange','Wake'))
  
#Grab the intersecting HUC_8s
#triangle_HUCs <- huc8_sf_utm %>% 
#  filter(st_intersects(x = ., y = triangle_co, sparse = F))
triangle_HUCs <- huc8_sf_utm[triangle_co,]

ggplot() + 
  geom_sf(data = triangle_co, color='grey') + 
  geom_sf(data = huc8_sf_utm, color='green',fill=NA) +
  geom_sf(data = triangle_HUCs, color='blue', fill=NA)

#Intersect the HUC_8s
triangle_HUCs.clipped <- triangle_co %>% 
  st_intersection(huc8_sf_utm)
## Warning: attribute variables are assumed to be spatially constant throughout all
## geometries
mapview(triangle_HUCs.clipped)

3.3 Geometry manipulations

Now that our data are all in a common planar coordinate systems, let’s examine what we can do with them. The sf cheat sheet provides a nice summary of the types of operations. You’ll see we already used a few of the functions listed here.

Here, we’ll explore some of these operations on spatial features so you can get a feel for how they work. - Extract centroids of features - Buffering features - Union many features to a single multi-part features - Compute convex hulls from multi-part features - Compute Voronoi polygons from multi-part features

#Select the triangle counties into a new sf data frame
triCo <- counties_sf_utm %>% 
  filter(NAME %in% c("Durham","Wake", "Orange", "Chatham")) 

#Plot
myMap = ggplot() + 
  geom_sf(data = triCo)
myMap

#Extract the centroids of the selected features and show them
triCo_centroids <-  st_centroid(triCo)
## Warning in st_centroid.sf(triCo): st_centroid assumes attributes are constant
## over geometries of x
myMap <- myMap + geom_sf(data = triCo_centroids, color = 'blue')
myMap

#Buffer the centroids outward 2km and add them to our
triCo_centroids_2km <- st_buffer(triCo_centroids, 2000)
myMap <- myMap + geom_sf(data = triCo_centroids_2km, color = 'orange', fill=NA)
myMap

#Buffer the counties inward 2km
triCo_in2km <- st_buffer(triCo, -2000)
myMap <- myMap + geom_sf(data = triCo_in2km, color = 'green', fill=NA)
myMap

#Combine the centroids into one feature and construct a convex hull around them
triCo_centroids_chull <- triCo_centroids %>% 
  st_union() %>% 
  st_convex_hull()
myMap <- myMap + geom_sf(data = triCo_centroids_chull, color = 'red', fill=NA)
myMap

#Combine the centroids into one feature and draw voronoi polygons
triCo_centroids_voronoi <- triCo_centroids %>% 
  st_union() %>% 
  st_voronoi()
myMap <- myMap + geom_sf(data = triCo_centroids_voronoi, color = 'purple', fill=NA)
myMap

3.4 Spatial selection

We can also use location to select features.

#User coordinates
userLat = 36.0045442
userLng = -78.9426381

#Create a simple features point geometry from the point
theSite_sfp <- st_point(c(userLng,userLat))

#Create a simple features column from the point geometry object
theSite_sfc <- st_sfc(theSite_sfp, crs = 4326)

#Transform the mask to match the CRS of the counties dataset
theSite_sfc_transformed <- st_transform(theSite_sfc, crs = st_crs(counties_sf_utm))

#Create a boolean mask 
resultMask <- st_intersects(counties_sf_utm, 
                            theSite_sfc_transformed,
                            sparse = FALSE) #The `sparse` option returns a Boolean mask

#Filter the counties dataset using the boolean mask
selCounties <- counties_sf_utm[resultMask,]

#Map the results
mapView(counties_sf[resultMask,])

Questions: how might we use the st_buffer function to show all counties within 30km of the site?

the_site_counties <- theSite_sfc_transformed %>% 
  st_buffer(.,dist=30000)

4. VISUALIZATION

Lastly, let’s take a deeper dive into the various ways to visualize our spatial data. We’ve done a bit of this already, but let’s formalize and expand what we’ve covered. You can take these visualizations much further than what we are presenting here, but these should reveal the basic strucutre and a few “gotchas” when constructing these plots.

4.1 Visualizing Multiple Datasets with ggplot

When we import sf, we add the geom_sf option to ggplot. This geometry works much like other geoms, but with a few additional options. Here we see that order of plotting is important.

#Wrong order
ggplot()  +
  geom_sf(data = epa_sf_utm, color='white', size=2) +
  geom_sf(data = counties_sf_utm, aes(fill = ALAND), color = 'white')  +
  geom_sf(data = state_sf_utm, color='red',size=2) + 
  scale_fill_gradient(low="yellow", high="darkgreen")

#Right order
ggplot() +
  geom_sf(data = state_sf_utm, color='red',size=2) +
  geom_sf(data = counties_sf_utm, aes(fill = ALAND), color = 'white')  +
  geom_sf(data = epa_sf_utm, color='blue', size=2) + 
  scale_fill_gradient(low="yellow", high="darkgreen")

4.2 Plotting with Leaflet

Leaflet is the most powerful of the three. However it requires getting all our data back into the WGS 84 CRS.

4.2.1 Multiple layers in Leaflet

# Convert all to WGS84 (crs=4326)
EPAair_wgs84 <- st_transform(epa_pm25_sites_sf, c=4326)
counties_WGS84 <- st_transform(counties_sf_utm, c=4326)
state_WGS84 <- st_transform(state_sf,c=4326)
huc8s_WGS84 <- st_transform(huc8_sf,c=4326)
huc2_WGS84<- st_transform(huc2_sf,c=4326)

#Now plot with leaflet: no errors as all layers are in Leaflet's native CRS (WGS84)
leaflet() %>% addTiles() %>% 
  addPolygons(data=counties_WGS84,weight=1,color='red') %>% 
  addPolygons(data=huc8_sf,weight=1)

4.2.2 Changing basemaps in Leaflet

Tip: See http://leaflet-extras.github.io/leaflet-providers/ for other basemaps

leaflet() %>% 
  addProviderTiles(providers$Esri.DeLorme) %>%  
  addPolygons(data = counties_WGS84, 
              color = "orange", 
              weight = 1, 
              smoothFactor = 0.5,   
              opacity = 1.0, 
              fillOpacity = 0.5,
              fillColor = ~colorQuantile("YlGnBu", ALAND)(ALAND)) %>% 
  addPolygons(data = huc2_WGS84, 
              color=NA, 
              weight = 2) %>% 
  addMarkers(data=EPAair_wgs84,
             popup = ~as.character(`Site Name`))

4.2.3 Leaflet - linked and synced plots

m1 <- leaflet() %>% 
  addTiles() %>%  
  addPolygons(data = counties_WGS84, color = "orange", weight = 1, smoothFactor = 0.5,   
              opacity = 1.0, fillOpacity = 0.5,
              fillColor = ~colorQuantile("YlOrRd", ALAND)(ALAND)) %>% 
  addMarkers(data=EPAair_wgs84,popup = ~as.character(`Site Name`))


m2 <- leaflet() %>% 
  addProviderTiles(providers$Stamen.TonerHybrid) %>% 
  addPolygons(data = huc8s_WGS84,weight=0.2,color='red') %>% 
  addCircleMarkers(data=EPAair_wgs84,
                   radius=(~meanPM*2),
                   stroke = FALSE, 
                   fillOpacity = 0.3,
                   popup = ~as.character(`Site Name`))



#install.packages("leafsync")
library(leafsync)

#Create an lattice view of the two leaflet maps
latticeview(m1, m2)
#Create a synchronized view...
sync(m1,m2)